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Abstract — Numerical issues related to the occurrence of error 
floors in floating-point simulations of belief propagation (BP) 
decoders are examined. Careful processing of messages corre- 
sponding to highly-certain bit values can sometimes reduce error 
floors by several orders of magnitude. Computational solutions 
for properly handling such messages are provided for the sum- 
product algorithm (SPA) and several variants. 

I. Introduction 

Belief propagation (BP) decoders based upon the sum- 
product algorithm (SPA) are widely used to decode error- 
correcting codes that have a sparse graphical representation, 
including, most notably, low-density parity-check (LDPC) 
codes. This paper addresses numerical issues arising in the im- 
plementation of SPA decoding and several of its variants that 
have an impact on the occurrence and severity of frequently 
observed error floors in the decoder performance curves. (For 
background on BP and LDPC coding, the reader is referred 

to (D-m).) 

In Q, MacKay and Postol examined near codewords asso- 
ciated with the error floors that they observed in BP decoding 
of Margulis-type codes over the additive white Gaussian noise 
(AWGN) channel. Shortly thereafter, Richardson p) wrote a 
seminal paper on the error floors of LDPC codes in the setting 
of the AWGN channel and the binary symmetric channel, 
identifying decoder-dependent, error-prone substructures in the 
Tanner graph, dubbed trapping sets, that were responsible for 
the observed floors. Later, Han and Ryan also studied error 
floors and their properties for LDPC codes used on the AWGN 
channel (6). With the exception of (5), which used a 5-bit 
hardware simulation to generate error-rate curves, it appears 
that the error floors reported in these prior studies were found 
with floating-point (FP) simulations of the BP decoder. 

A quantization scheme, such as FP, imposes a limited 
range and finite resolution on the values to be represented. 
Additionally, the use of non-linear functions in a quantized 
environment may dramatically further limit the domains and/or 
images of said functions. This paper addresses these issues of 
range and resolution of the messages in several SPA variants 
implemented in FP. Recent work has shown that the error 
floors of variable-regular LDPC codes, such as the Margulis 
code considered by the authors of (3]-j6), are largely the result 
of numerical problems associated with the processing of highly 
certain messages in the BP decoder implementation (7j-|[9j. 

This work improves the range of several known SPA vari- 
ants, introduces a new SPA variant, and presents simulation 
results for two variable-regular LDPC codes whose error floors 
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are reduced by orders of magnitude by addressing these nu- 
merical range issues. Motivated by these observations, Zhang 
and Siegel 1 10 1 have devised several small-bit-width decoders 
that substantially lower the observed error floors of several 
LDPC codes. These decoders use non-uniform quantization 
techniques that accept reduced message resolution for highly 
certain messages for the sake of increased message range. 

II. Background 

A. Code and Channel 

LDPC codes are defined by the null space of a parity check 
matrix H. The codewords are the set of column vectors C, 
such that c G C satisfies He = over a particular field. A 
given code can be described by many different H matrices. 

The H matrix over GF(2) may be associated with a bipartite 
graph B = (V,C,E), called a Tanner graph, in which the 
vertex set may be partitioned into two disjoint sets V and 
C. The set of variable nodes V represent the symbols of the 
codeword that are sent over the channel and correspond to the 
columns of the parity-check matrix. The set of check nodes C 
enforce the parity-check equations represented by the rows of 
H. Each edge e G E of a Tanner graph joins a variable node 
Vi £ V to a check node cj G C. The (J, i)th entry of H is 1 
if (vi, Cj) £ E and is otherwise. 

Assumption. We are only concerned with codes over the 
binary field GF(2). Also, we assume binary antipodal signaling 
over the AWGN channel for the purposes of illustration, but 
the fundamental numerical issues are independent of these 
assumptions. 

After encoding, each binary element of codeword c is 
transmitted over the AWGN channel as a binary antipodal 
symbol ti G { + 1, — 1}- Every received symbol rj is simply 
the sum ti+rii of the transmitted symbol plus independent and 
identically-distributed (i.i.d.) Gaussian noise of zero-mean and 
variance a 2 . This yields a channel SNR of 1/er 2 or 2REb/N , 
where R is the rate of the code, Ej, is the received energy per 
information bit for coherent bandpass detection, and N is the 
one-sided power spectral density of the noise. 

B. Sum-Product Algorithm (SPA) 

We next describe the sum-product algorithm (SPA) that is 
one of the most widely used forms of BP decoding for LDPC 
codes. For codes with a cycle-free Tanner graph representation, 
SPA decoding on such a graph is optimal, in the sense 
that it is equivalent to symbol-wise, maximum a-posteriori 
(MAP) decoding. Such codes are generally not attractive [11]. 
Although no longer optimal when used to decode LDPC 
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codes whose Tanner graph representations have cycles, SPA 
decoding has nevertheless been found to provide near-optimal 
performance, at least in certain ranges of signal-to-noise ratio. 

As is often done, we will first implement our decoder 
simulation in the log-domain; this removes the need for 
normalization and is also closer to the approximations used in 
hardware implementations. In describing the algorithm, we use 
the notation Af(i) to indicate the set of check nodes adjacent 
to variable node Vi, and we use Af(j)\i to denote all variable 
nodes adjacent to check node c,-, excluding variable node Uj. 

First, the received symbols n are converted to log-likelihood 
ratios (LLRs) defined by 

AM=la P S r f = +1 j. 
P(n\u = -i) 

For the AWGN channel, this becomes A M = 2ri/a 2 . 

The LLR A^ is the initial message passed from variable 
node Vi to each of its adjacent check nodes in the Tanner 
graph. That is, \ [ ^ o] = A H for all i,j such that (v t , c,) € E. 
The iteration counter I is then initialized to 1. 

On the first half-iteration of iteration I, the message sent 
from check node Cj to the adjacent variable node Vi is given 
by 



\f^ j] = 2 tanh" 



LkeSS(j)\i 



A z-i 



(1) 



for each edge in the graph. During the second half-iteration, 
the return message sent from variable node to adjacent 
check node Cj is given by 

x [i^j] =x {i\ + ^2 \f^ k] . (2) 

fc£jV(i)\j 

If the early termination logic detects a codeword or if the 
iteration counter I exceeds the maximum allowed count, the 
iterations are halted. Otherwise, the iteration counter is incre- 
mented by 1, and the new check-to-variable-node messages are 
computed using ([TJ. Upon the completion of the iterations, the 
decision for symbol i, denoted if, is set equal to the sign of 
the incoming message sum at variable node v%, that is, 



if = signf A [i 



ktM(i) 



[i<-k] 



If B has no cycles, the message sum is equivalent to 
In D/ r !* i=+ ^ , the MAP decision statistic. 

P(r|t 4 =-1)' 



C. Floating-Point Notation 

Floating-point (FP) formatting offers an economical com- 
puter representation (i.e., quantization) of real numbers cover- 
ing a wide dynamic range with a significant level of precision. 

The elements of an FP number are stored separately: the 
sign bit, the exponent, and the significand. Table [I] shows the 
basic binary formats included in IEEE Standard 754-2008, 
which uses base-2 representation (T2j. The parameters p and 
emax denote the number of binary digits in the significand 
(i.e., precision) and the maximum exponent, respectively. The 
standard requires the minimum exponent be ernin = 1 — 
emax. Since normalized FP numbers are expressed with the 



TABLE I 

Floating-point basic binary formats of IEEE Standard 754 p"2) . 



Name 


Bits stored 


p bits 


emax 


Single-precision (SP) 


32 


24 


+127 


Double-precision (DP) 


64 


53 


+1023 


Quadruple-precision (QP) 


128 


113 


+16,383 



radix point after the first binary digit of the significand, the 
maximum supported FP value may be computed to be 



p ones 



(1.111. 



(2 - 2 1 - p ) 2 e 



Normalization of binary FP numbers is required by the 
standard when possible. These normalized numbers must have 
1 as the most significant bit of the significand, which need not 
be stored. Very small FP numbers may become subnormal (or 
"denormalized") when they are too small to be normalized. 
Subnormal numbers are supported by the standard and do 
not use the full precision available. The smallest normalized 
positive value is 2 emm and the smallest subnormal positive 
value is 

p—l bits 

(.000 . . . 001) 2 -2 emm = 2 emm+1 -P 

per p2] §3.3]. The available binary FP values in the immediate 
vicinity of 1 are the following: 



1 - 3 • 2- p , 1 - 2 ■ 2- p , 1 - 2- p , 1, and 1 



D. Numerical Problem in FP 



(3) 



Direct implementation of ([T]) yields numerical problems at 
high LLRs. Letting y = tanh(A/2) < 1, we may express y as 



2e 



in order to find when FP quantizes y to 1. Ideally, the values 
of y near 1 are rounded to the closest quantized level listed 
in (pV So to be rounded-up to 1, y > 1 — 2~ p /2 must be 
satisfied, which is equivalent to 



2e 



— A 



< 2- { p +1) and 



A + ln(l + e" A ) > (p + 2)ln2. 



(4) 
(5) 



We may very accurately approximate |5]l by A > (p + 2) In 2 
or 38.1230949 in DP-FP (64-bit IEEE 754) with its p = 53 
bits of precision. 

As an argument of ±1 will cause the tanh -1 function 
to overflow, protection from high magnitude LLRs must be 
added to (|TJ or (|2j to ensure numerical integrity or an 
alternative solution not using tanh -1 must be found. Thus, 
preventing tanh -1 overflow by limiting LLRs will result in 
a maximum producible LLR magnitude. Our examination of 
published error-floor results suggests that such LLR limiting 
(or "saturating" or "clipping") is commonly employed. 
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III. Preferred SPA Solution 

In this section we describe our preferred SPA solution, its 
numerical limits, and speed issues. The relationship known as 
the Jacobian logarithm is 

\n(e x + e y ) = ln(e x ) + In (l + e y - x ) 

= ln(e v ) + In (l + e x ~ y ) (6) 

= max(x, y) + In (1 + exp(— |x — y\)) . 

Using ^ and other identities, an alternative exact pairwise 
check node reduction may be derived fl3) , fl4) . For example, 
if we set Af(j) \ i = {ki, fe}, the check-node message in ([!]) 
can be computed as follows: 



A, -faignA J _ 1 



[*i->i] 



l-l 



J*2->j] 



j] 



H-l 



In 1 



In 1 



exp 



exp 



[ki-yj] 
i-i 



+ A 



l-l 



[ki-*j\ x[fe2->i] 



l-l 



i-i 



(7) 



The check-node-message re-formulation in |7]) contains no 
possibility of overflow, regardless of the LLR magnitude, 
which for DP-FP extends to approximately 1.798 x 10 308 . The 
only potential for overflow in the SPA is now just the addition 
operation within Q at extremely high LLRs. 

The computer implementation of |7]i does not necessarily 
have a large impact on simulation speed. A single hyperbolic 
tangent evaluation consumes nearly the same number of CPU 
cycles as four exponential or logarithmic evaluations on a 
modern processor in our experiments. The most significant 
impact is that (jTjl forces us to organize computations pair- 
wise. Hu et al. present a substantial speed improvement by 
organizing computation pairs onto a trellis by the forward- 
backward algorithm, which computes all the output messages 
of the check node |T3j. 

The next level of potential speed optimization is to approx- 
imate the In (1 + exp(— |x|)) operations, for which results lie 
in the interval (0, In 2]. There has been significant work in 
this area, much of it focused towards hardware implementation 
p3) , p3) . For simulation, we have often adopted the following 
two-piece linear approximation of Richter et al. 1 16 1. 



In 1 



o-\*\ 




0.24 Id 



if \x\ < 2.5 
otherwise. 



We have noted losses of about 0.02 dB in the AWGN waterfall 
region of the FER curve for the (2640, 1320) Margulis code 
with this approximation, while the simulation runs nearly 4 
times faster. This is an acceptable trade-off for a decoder 
simulation tool used to study the error floor. 

IV. Additional SPA Formulations 

In this section we address the numerical issues of other 
versions of the SPA implemented using FP computer process- 



ing. As explained in Section II-D the tanh(A/2) function 
loses accuracy and rounds to 1 for A > (p + 2) In 2. Thus, 
changing to larger FP formats for added precision increases 
the LLR limits only linearly. This section explores alternative 
formulations of SPA, some of which increase LLR dynamic 
range. 



A. Min-Sum Algorithm (MSA) 

The min-sum algorithm (MSA) uses the following approx- 
imation to the check-node-update expression 



A 



>'<-j] 



mm 

keM(j)\i 



[k->j] 
l-l 



n 

k£MQ)\i 



signA^ 1 . 



(8) 



Since this expression is equivalent to (j7]) with the two loga- 
rithmic terms assumed to be zero, |8]l also won't overflow. 
However, the decoder performance losses that have been 
observed by using MSA have ranged from 0.60 to 1.22 dB 
p3J in AWGN, depending on the code. To reduce these losses, 
variations on ([8) known as normalized-BP and offset-BP have 
been used successfully J3J, (15). Note that as LLRs get very 
large, the MSA closely approximates the SPA as the two 
logarithmic terms of (j7]) become relatively small. 

B. Gallager' s Involution Transform (GIT) 

In Gallager proposed another version of the check-node- 
update expression using logarithms to replace the mul- 
tiplications with additions. The resulting check-node update 
becomes 



A 



n si § n 

fceAA(i)\i 



[k-H] 



l-l 



where we define 



$(x) 



4> 



In tanh 



E 



In 



K i-i 



1 + e- 
1 -e- 



(9) 



for x > 0. Since the function <E> (x) is its own inverse, i.e., 
$ ($ (x)) = x, for all x > 0, this technique is sometimes 
called Gallager's involution transform (GIT). Because the 
function $ (x) transforms values between domains in which 
addition is the primary means of computing, this technique 
was originally proposed for low-complexity hardware imple- 
mentation of the SPA. We are also aware of its appearance in 
SPA simulation code, in spite of the fact that addition is no 
faster than multiplication on a modern FP processor. 

Both expressions in |9]l suffer finite-precision problems 



prior to taking the logarithm. As explained in Section II-D 
tanh (A/2) is rounded to 1 for A > (p+ 2) In 2, which would 
yield $ (A) = — In 1 = 0. Similarly, 1 — e~ x and 1 + e~ x are 
rounded to 1 for x > (p+ 1) ln2 and x > pin 2, respectively. 
Thus, the computational limit of LLR magnitude using |9} is 
at best A > (p + 2)ln2. 

C. Gallager's Involution Transform, Amended (GIT2) 
Given the following series expansion in the range x > 0: 

I n (l + e -) = e-*- e ^+ e ^ + ... ) 
the series expansion of ([9]) is readily found to be 



In 



= 2 



- ?> X 



- 5.t: 



(10) 
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Fig. 1. Numerical accuracy of several versions of — lntanh(a;/2) plotted 
as significant bits vs. x (LLR value) using DP-FP computations. 



for x > 0. This can clearly be approximated by a small 
number of terms as x grows large. What has been gained is 
that (10 1 will not round to zero until x > (emax + p)\n2, 



thus providing a substantial increase in LLR dynamic range. 
Also, the value e ~ x + ln2 w oi not begin to lose accuracy as it 
"denormalizes" until x > emax\n2. 

Our next step is to find the cross-over in accuracy between 
Q and (JTOj with a limited number of terms. Make the 
following computations: 



&o( x ) — — lntanh 
$ 1 (x)±2e-*, 

* 2 (a:) = 2e" 



2e 



x 
2 

-3,T 



and 

3 

ei(x)±\$ t (x)-$(z)\/*(x), 

where we compute $(2;) using our best approximation at each 
x and not ^ explicitly. The expression for e, (x) computes the 
relative error of the corresponding computation $>i(x). 

In Fig. [T] we plot the relative error of each computation 
as — log 2 (ei) versus x to show the accuracy in bits of the 
several computational versions of in DP-FP. We observe 
a cross-over of the lower bound of the accuracy of $o(x) 
and the accuracy of the single-term power series $i(x) at 
x w 12.4 and 37.2 bits of accuracy, which is still quite good. 
Thus, computationally, the following approximation to |9]l has 
much greater dynamic range than directly implementing ([9]) 
on a computer, at acceptable accuracy for many applications: 



- lntanh (§) 

,-x+ln2 



< x < 12.4 
x > 12.4. 



(11) 



The value returned by (Hi will not round to zero until x > 
(emax + p)\n2 or 745.8 in DP-FP. If even more accuracy 
is desired, additional terms may be employed at an earlier 
cross-over. For instance, Fig. [T] shows that the cross-over in 
the accuracy of &o(x) and the two-term power series $2(2:) 
occurs at x « 7.35 with over 44 bits of accuracy. However, no 
larger dynamic range is achievable while employing Gallager's 
involution transform. 



Our proposed computation ( fTT) achieves a factor of 
(emax + p)/(p + 2) increase in the LLR dynamic range of 
(|9j, which is approximately a 20-fold improvement for DP-FP. 
Also, at LLR magnitudes larger than 12.4, this simple modi- 
fication achieves higher accuracy with reduced computational 
complexity than |9]). 

D. Likelihood Ratio (LR) 

An alternative to using tanh in the SPA is to perform the 
required computations in the likelihood ratio (LR) domain 
(T7J, fl8|. Interestingly, the non-uniform FP quantization of 
LRs maps to nearly uniform quantization in the LLR domain. 



If we let = expAW, L 
variable-node update becomes 



exp A 



and so on, the 



L 



L 



[i->j] = L [i] 



n 



or 



k<£Af(i)\j 



exp 



lnL' ; 



E 



lnX ' 



(12) 



(13) 



Letting JV(j)\i = {feijfea}, the pairwise check-node update 
may be computed as 



- [i<- j] 



1 



t [ki-yj] T [fc2- 



3] 



-[ki-yj] 
J l-i 



L 



l-i 



(14) 



Andrews [18| notes that multiplicative overflow within (14i 



if the input LRs are limited to L 



l-l 



< 



is avoided 

v / 2 emax (2 - 2 x -p), which corresponds to an LLR limit of 
about 354.89 in DP-FP. We have found a numerical improve- 
ment to this computation which doubles its LLR range. That 
improvement is to appear in the full version of this paper. 



Note that the variable -node update ( 12 1 is more sensitive to 
multiplication overflow than (14i for d v > 3. However, (12i 



may be re-stated as ( 13 1 which has no intermediate overflows. 



We may gracefully limit the argument of the exponential in 
( p"3j ) to restrict the output LR range and prevent overflow. Of 
course, ( fT3] l is more complex to evaluate. 

E. Likelihood Difference (LD) or Tanh Domain 

Another alternative is to perform the computations in the 
likelihood difference (LD) domain, on the interval (—1,-1-1) 
|fT7 |. This is variously known as the tanh or soft-bit domain 
Jljij. If we let $W = P(n\ti = -1) - P(ri\ti = +1) and so 
on, the pairwise variable-node update becomes 

5 [ : ' 



1 



-fci]^[i+ 



while the check-node update is simply 



n 



Xl k ~>3] 
J /-l • 



(15) 



(16) 



feeJV(i)\i 



The dominant numerical issue for the check-node update is 
the resolution of 8 in the neighborhood of 5 — ±1. The LD 
messages closest to absolute certainty in FP are ±(1 — 2~ p ). 
Since LLRs are related to LDs by A = ln(l+(5)-ln(l-<5), the 
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greatest nearly certain message available in LD is equivalent 
to an LLR magnitude of 



A = ln(2 - 2- p ) - ln(2- p ) 
= (p+l)ln2 + ln(l-2~ 



(p + l)ln2, 



(17) 



where our approximation error is less than the available 



resolution of FP. Note that the variable-node update ( 15 i may 
suffer from rounding to ±1 issues and divide by errors 
for highly certain messages, but our focus in this section has 
been on the limitations of the check-node updates. Thus, for 



comparative purposes, we use (17i to represent the equivalent 
LLR limits of this formulation. 

F. Offset-Likelihood Difference (OLD) 

The magnitude of a quantization error in (normalized) FP 
is roughly proportional to the amplitude of the represented 
value. Thus, FP quantizes with greater absolute accuracy close 
to than close to 1. This leads us to propose that LDs be 
offset such that check-node computations are in the form / = 
1 — \6\ = 1 — tanh | A/2 1 , so that highly certain messages are 
near / = 0. Since the check-node update for LD ( 16 1 is odd in 
every input and sign(i5) = sign(A), we may simply handle the 
sign at the conclusion of the check-node-message calculation. 

To begin the derivation of the check-node computation for 
the offset-likelihood difference (OLD) algorithm we note that 



1 



n 



k£Af(j)\i 

simplifies significantly when performed pairwise. Letting 
\ i = {ki, ^2}, the check-node computation becomes 



h ~ h-i Ji-i h-i 



h-i 



(18) 



If we wish to perform the variable-node update in the 
LLR domain, then we note that the transformations between 
domains are relatively simple, since 

2e-l A 'l 2 
/* = ^-qid = (19) 



1 



and 



In 



1 + el A *l 
fi 



(20) 



Algorithm 1 Offset-LD Check-Node Update for One Edge 
l: procedure OFFSET_LD_CN(n, A) 

2: / <— > f holds the magnitude, < / < 1 

3: si — hi > s holds the sign, s e {+1, —1} 

4: for i <— l,n do 

5: .g^2* exp(-|A[i]|)/ [1 + exp(-|A[i]|)] 

6: / <~ f + 9- f*9 

1: s <— s * sign(A[i]) 

8: end for 

9: X out <- S * In (^) 
10: return \ out 

ii: end procedure 



The pairwise calculation of (18 1 generalizes easily to the 
recursion of Algorithm [T] By definition, the range of / is 



TABLE II 

Upper LLR limits of SPA Formulations with respect to Check 
Node Inputs 



Tech- 
nique 



1 

if 

:asa 

GIT 

GIT2 

LR 

LD 

OLD 



LLR-equivalent 
limit (approx.) 



(p + 2)ln2 

<2emax-\-l 

(p + 2)ln2 
(emax + p) In 2 
(emax + 1) In 2/2 
(p + l)ln2 
(emax + p) In 2 



LLR limit 
for DP-FP 



38.12 
1.798 x 10 308 
1.798 x 10 308a 
38.12 
745.8 
354.9 
37.43 
745.8 



LLR limit 
for QP-FP 



79.72 
1.190 x 10 4932 
1.190 x 10 4932a 
79.72 
11434 
5678 
79.02 
11434 



J MSA decoder approximates SPA with a performance loss of 0.6 to 1.22 dB. 

the interval (0,1]. For simplicity Algorithm [T] is written to 
compute a single output message; however, a check node 
must compute an output for each edge. Thus, the transformed 
LLR values from (19i on line 5 of Algorithm [T] may be pre- 
computed (just once) for all output messages. Furthermore, we 
may organize the pairwise computations on a trellis fPT) . 

The computational complexity of Algorithm [T] is less than 
that of Q; however the dynamic range is also less. The 
numerical dynamic range is limited due to underflowing (i.e., 
rounding to zero) 2e ' A ' in (19 1. This was shown in Section 
IV-C to occur as LLRs exceed (emax + p) In 2. 



The other numerical concern is overflowing 2// in (20 1 
when / is so small that it is significantly denormalized. 
However, for small / (e.g., fi < 2~ p ), we may accurately 
restate (20l as |Aj| = ln2 — In/;, which has no major 



numerical concerns and fewer operations. 

G. Summary of SPA Formulations 

Table |H| summarizes the findings on LLR limits with respect 
to check-node inputs. The traditional formulations based on 
tanh operations in ([TJ, Gallager's involution transform, and 
LD are all very limited in their usable LLR range. Our 
preferred approach (|7]i has a range 306 orders of magnitude 
greater. 

Fig. [2] shows the output accuracy of a CN using 64-bit FP 
calculations for several SPA formulations. The MSA approx- 
imation peaks at low LLRs and then improves substantially, 
while offset-BP is more accurate than MSA at low LLR and 
less accurate at high LLR. The approximation by Richter et al. 



1 16] has substantially less error than the other approximations 
throughout the LLR range. Finally, the vertical lines indicate 
two exact formulations reaching their upper LLR limit and 
their error beginning a linear growth with respect to LLR. 

V. Hybrid SPAs and rescaling 

Fig. [2] also suggests that hybrid SPA solutions - that is, 
switching from a precise formulation to an approximation as 
LLRs grow - may be acceptable. In fact, one of our SPA 
decoder implementations in DP-FP switches from the Richter 
et al. approximation to MSA once a very large LLR, say 2 P+3 , 
is first seen. At such a point in decoding, the approximation 
error of the less-complex MSA is less than the resolution of 
the values for most of the messages in the graph. 
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Fig. 2. RMS LLR error versus mean input LLR value, m\, for check node 
calculation with 4 input LLRs having i.i.d. Gaussian distribution with variance 
cr? = 2m \. 
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Fig. 3. 64-bit FP resolution in LLR units versus equivalent LLR A. 



Also, it is simple to push the LLR limits beyond that shown 
in Table [TT] for formulations that already support large LLRs. 
At very large LLRs the update rules are largely insensitive to 
scaling, and MSA is always insensitive to scaling. So, once we 
detect an extremely large LLR in decoding, say 10 305 in DP- 
FR we may rescale all LLRs (including the LLRs from the 
channel) substantially to provide additional headroom. This 
alleviates the need to consider the QP-FP format for more 
LLR range. We recognize that, when quantizing FP in the 
LLR domain (as we prefer) and rescaling, the quantization 
steps get larger as LLR magnitudes increase. We believe this 
effect to be tolerable when operating in the error floor region 
as LLRs grow exponentially in trapping set conditions (7)-|(9)- 



It is approximately proportional to the value represented, with 
a proportionality constant of a — 2 -52 5 = 1.6 x 10~ 16 . 

We can plot the other domains using a simple transforma- 
tion. For instance, in the LR domain (L = e A ) of Section 
IV-D we find the FP-quantization steps of L as a function of 
A are approximately 

AL w aL = ae x in LR units. 

To transform small changes in L to LLR units we need only 
multiply by the magnitude of the derivative \dX/dL\ = l/L 
to produce 



dX 



dL 



AL ss aL/L = a in LLR units, 



approximately a constant value. As shown in Fig.[3]the true LR 
resolution in LLR units dithers about our approximate result 
until it runs out of range. The GIT2 and OLD domains yield a 
resolution overlapping the LR result, but the step-size of GIT2 
is smaller at LLRs less than 0.2. For the LD or tanh domain 
we perform the same analysis and produce the top curve in 
Fig. [3] which shows the step-size growing rapidly. 

VII. Simulation 

We present performance simulation results for two codes 
to demonstrate the techniques of this paper. Fig. [4] shows the 
frame error rate (FER) of the (2640, 1320) Margulis code, 
which is a (3, 6)-regular LDPC code. We show MacKay and 
Postol's Q results in addition to our own. While they show 
the start of an error floor at 10~ 6 at E b /N = 2.4 dB, 
other studies have found the floor to be substantially higher 
(3)' (5)' ©• MacKay and Postol note that this error floor is 
caused by certain near codewords Q. Our own simulation, 
using a non-saturating SPA decoder running for a maximum 
of 200 iterations, showed no sign of a floor down to 10~ 8 and 
significantly lower. In fact, resorting to techniques similar to 
importance sampling we found an error-rate contribution due 
to the supposedly dominant near codewords of just 2 x 10~ n 
at E b /N = 2.8 dB for this decoder (9). 

Fig. [5] shows the FER results for an LDPC code listed 
as 1057.244.3.457 in |j20). It is a (3, 13)-regular code, with 



block length 



1057, rate R « 0.77, and d ri 



Again, we compare our results to those of MacKay. We used 
our non-saturating SPA decoder with three settings for the 
maximum number of iterations: 10 3 , 10 5 , and 10 7 . The average 
number of decoder iterations performed at E b /No — 4.574 
dB was 2.290, 2.344, and 4.08, respectively. Despite the small 
increase in average iterations required, significant performance 
improvements were obtained by using the larger maximum 
number of iterations. 



VI. Floating-Point Resolution 

Since range is not the only issue in quantization, we briefly 
examine floating-point (FP) resolution for the several domains 
covered. Fig. [3] shows the DP-FP resolution (or quantization 
step-size) of several domains plotted in terms of LLR resolu- 
tion versus equivalent LLR. Since the curve for "LLR domain" 
is in its native domain, the resolution takes on discrete values. 



VIII. Conclusion 

We have addressed numerical limitations on the allowable 
size of log-likelihood ratios (LLRs) in floating-point (FP) sim- 
ulations of several formulations of the sum-product algorithm 
(SPA). We have described preferred techniques to accommo- 
date very large LLR values and even unbounded range through 
rescaling. Additionally, we have proposed simple numerical 
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Fig. 4. FER vs. E b /Nq in dB for the Margulis LDPC code in AWGN. 




Fig. 5. FER vs. E b /N in dB for n = 1057 LDPC code in AWGN. 



improvements to Gallager's involution transform that extend 
its dynamic range by a factor of about 20 (for double-precision 
FP computations). Finally, we introduced a new exact SPA 
formulation, "offset-likelihood difference," which supports a 
moderate LLR range with low computational complexity. 
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